What is Linear Regression?

Linear Regression is a statistical technique that aims at measuring the relationship between two variables. It helps to visually see the trend and relationships via a ‘line of best fit’ in graphs. As well as allowing predictions of the dependent variable for any given independent variable.
The equation is derived from this formula:
\[ y = mx+b \]
where:
\(y\) = dependent variable
\(x\) = independent variable
\(m\) = slope
\(b\) = y-intercept

Linear Regression Equation

\[ y = \beta_0 + \beta_1 \cdot x + \varepsilon \] where:
\(y\) = dependent variable
\(\beta_0\) = intercept
\(\beta_1\) = slope
\(x\) = independent variable
\(\varepsilon\) = error term

Visualizing Linear Regression in R-Studio: Data Import

For this example we will be using the palmerpenguins data set. Collected by Dr.Kristen Gorman, this data set contains information for 344 penguins, across 3 different species and 3 islands in the Palmer Archipelago.

library(palmerpenguins)
library(dplyr)
library(ggplot2)
library(plotly)
data(penguins)

Visualizing Linear Regression in R-Studio: Constructing Plot

Code to create simple scatter plot:

ggplot(penguins, aes(x = flipper_length_mm,
                          y = bill_length_mm,
                          color = species))+
  geom_point() +
  labs(x = "Flipper Length (in mm)",
       y = "Bill Length (in mm)",
       color = "Species",
       title = "Comparing Penguin Bill to Flipper Length")

Visualizing Linear Regression in R-Studio: Constructing Plot Cont.

Visualizing Linear Regression in R-Studio: Adding Regression Line

geom_smooth() adds a ‘line of best fit’. If you do not state a method() then by default it will plot a smooth curve. In order to get a line you must declare geom_smooth(method = "lm").

#---------- Constructs basic scatter plot ------------
ggplot(penguins, aes(x = flipper_length_mm,
                     y = bill_length_mm,
                     color = species)) +
  geom_point() +
  # Adds regression line
  geom_smooth(method = "lm",
              se = FALSE) +
  # Adds labels 
  labs(x = "Flipper Length (in mm)",
       y = "Bill Length (in mm)",
       color = "Species",
       title = "Comparing Penguin Bill to Flipper Length")

Visualizing Linear Regression in R-Studio: Adding Regression Line Cont.

Interactive 3D Scatter Plot with Plotly